101 research outputs found

    Blockchain for Genomics:A Systematic Literature Review

    Get PDF
    Human genomic data carry unique information about an individual and offer unprecedented opportunities for healthcare. The clinical interpretations derived from large genomic datasets can greatly improve healthcare and pave the way for personalized medicine. Sharing genomic datasets, however, pose major challenges, as genomic data is different from traditional medical data, indirectly revealing information about descendants and relatives of the data owner and carrying valid information even after the owner passes away. Therefore, stringent data ownership and control measures are required when dealing with genomic data. In order to provide secure and accountable infrastructure, blockchain technologies offer a promising alternative to traditional distributed systems. Indeed, the research on blockchain-based infrastructures tailored to genomics is on the rise. However, there is a lack of a comprehensive literature review that summarizes the current state-of-the-art methods in the applications of blockchain in genomics. In this paper, we systematically look at the existing work both commercial and academic, and discuss the major opportunities and challenges. Our study is driven by five research questions that we aim to answer in our review. We also present our projections of future research directions which we hope the researchers interested in the area can benefit from

    Blockchain for Genomics:A Systematic Literature Review

    Get PDF
    Human genomic data carry unique information about an individual and offer unprecedented opportunities for healthcare. The clinical interpretations derived from large genomic datasets can greatly improve healthcare and pave the way for personalized medicine. Sharing genomic datasets, however, pose major challenges, as genomic data is different from traditional medical data, indirectly revealing information about descendants and relatives of the data owner and carrying valid information even after the owner passes away. Therefore, stringent data ownership and control measures are required when dealing with genomic data. In order to provide secure and accountable infrastructure, blockchain technologies offer a promising alternative to traditional distributed systems. Indeed, the research on blockchain-based infrastructures tailored to genomics is on the rise. However, there is a lack of a comprehensive literature review that summarizes the current state-of-the-art methods in the applications of blockchain in genomics. In this paper, we systematically look at the existing work both commercial and academic, and discuss the major opportunities and challenges. Our study is driven by five research questions that we aim to answer in our review. We also present our projections of future research directions which we hope the researchers interested in the area can benefit from

    OntoCAT - a simpler way to access ontology resources

    Get PDF
    OntoCAT is an open source package developed to simplify the task of querying heterogeneous ontology resources. It supports local ontologies in OBO and OWL format as well as public repositories NCBO BioPortal and EBI Ontology Lookup Service (OLS). It is available from "http://ontocat.sourceforge.net":http://ontocat.sourceforge.ne

    Validation of New Gene Variant Classification Methods:a Field-Test in Diagnostic Cardiogenetics

    Get PDF
    Background: In the molecular genetic diagnostics of Mendelian disorders, solutions are needed for the major challenge of dealing with the large number of variants of uncertain significance (VUSs) identified using next-generation sequencing (NGS). Recently, promising approaches using constraint metrics to calculate case excess scores (CE), etiological fractions (EF), and gnomAD-derived constraint scores have been reported that estimate the likelihood of rare variants in specific genes or regions that are pathogenic. Our objective is to study the usability of these constraint data into variant interpretation in a diagnostic setting, using our cardiomyopathy cohort. Methods and Results: Patients (N = 2002) referred for clinical genetic diagnostics underwent NGS testing of 55–61 genes associated with cardiomyopathies. Previously classified likely pathogenic (LP) and pathogenic (P) variants were used to validate the use of data from CE, EF, and gnomAD constraint analyses for (re)classification of associated variant types in specific cardiomyopathy subtype-related genes. The classifications corroborated in 94% (354/378) of cases. Next, we reclassified 23 unique VUSs to LP, increasing the diagnostic yield by 1.2%. In addition, 106 unique VUSs (5.3% of patients) were prioritized for co-segregation or functional analyses. Conclusions: Our analysis confirms that the use of constraint metrics data can improve variant interpretation, and we, therefore, recommend using constraint scores on other cohorts and disorders and its inclusion in variant interpretation protocols

    Feasibility of predicting allele specific expression from DNA sequencing using machine learning

    Get PDF
    Allele specific expression (ASE) concerns divergent expression quantity of alternative alleles and is measured by RNA sequencing. Multiple studies show that ASE plays a role in hereditary diseases by modulating penetrance or phenotype severity. However, genome diagnostics is based on DNA sequencing and therefore neglects gene expression regulation such as ASE. To take advantage of ASE in absence of RNA sequencing, it must be predicted using only DNA variation. We have constructed ASE models from BIOS (n = 3432) and GTEx (n = 369) that predict ASE using DNA features. These models are highly reproducible and comprise many different feature types, highlighting the complex regulation that underlies ASE. We applied the BIOS-trained model to population variants in three genes in which ASE plays a clinically relevant role: BRCA2, RET and NF1. This resulted in predicted ASE effects for 27 variants, of which 10 were known pathogenic variants. We demonstrated that ASE can be predicted from DNA features using machine learning. Future efforts may improve sensitivity and translate these models into a new type of genome diagnostic tool that prioritizes candidate pathogenic variants or regulators thereof for follow-up validation by RNA sequencing. All used code and machine learning models are available at GitHub and Zenodo

    MYO5B, STX3, and STXBP2 mutations reveal a common disease mechanism that unifies a subset of congenital diarrheal disorders:A mutation update

    Get PDF
    Microvillus inclusion disease (MVID) is a rare but fatal autosomal recessive congenital diarrheal disorder caused by MYO5B mutations. In 2013, we launched an open-access registry for MVID patients and their MYO5B mutations (www.mvid-central.org). Since then, additional unique MYO5B mutations have been identified in MVID patients, but also in non-MVID patients. Animal models have been generated that formally prove the causality between MYO5B and MVID. Importantly, mutations in two other genes, STXBP2 and STX3, have since been associated with variants of MVID, shedding new light on the pathogenesis of this congenital diarrheal disorder. Here, we review these additional genes and their mutations. Furthermore, we discuss recent data from cell studies that indicate that the three genes are functionally linked and, therefore, may constitute a common disease mechanism that unifies a subset of phenotypically linked congenital diarrheal disorders. We present new data based on patient material to support this. To congregate existing and future information on MVID geno-/phenotypes, we have updated and expanded the MVID registry to include all currently known MVID-associated gene mutations, their demonstrated or predicted functional consequences, and associated clinical information.</p

    Genotype harmonizer:automatic strand alignment and format conversion for genotype data integration

    Get PDF
    BACKGROUND: To gain statistical power or to allow fine mapping, researchers typically want to pool data before meta-analyses or genotype imputation. However, the necessary harmonization of genetic datasets is currently error-prone because of many different file formats and lack of clarity about which genomic strand is used as reference. FINDINGS: Genotype Harmonizer (GH) is a command-line tool to harmonize genetic datasets by automatically solving issues concerning genomic strand and file format. GH solves the unknown strand issue by aligning ambiguous A/T and G/C SNPs to a specified reference, using linkage disequilibrium patterns without prior knowledge of the used strands. GH supports many common GWAS/NGS genotype formats including PLINK, binary PLINK, VCF, SHAPEIT2 & Oxford GEN. GH is implemented in Java and a large part of the functionality can also be used as Java 'Genotype-IO' API. All software is open source under license LGPLv3 and available from http://www.molgenis.org/systemsgenetics. CONCLUSIONS: GH can be used to harmonize genetic datasets across different file formats and can be easily integrated as a step in routine meta-analysis and imputation pipelines

    Therapeutic prospects of exon skipping for epidermolysis bullosa

    Get PDF
    Epidermolysis bullosa is a group of genetic skin conditions characterized by abnormal skin (and mucosal) fragility caused by pathogenic variants in various genes. The disease severity ranges from early childhood mortality in the most severe types to occasional acral blistering in the mildest types. The subtype and severity of EB is linked to the gene involved and the specific variants in that gene, which also determine its mode of inheritance. Current treatment is mainly focused on symptomatic relief such as wound care and blister prevention, because truly curative treatment options are still at the preclinical stage. Given the current level of understanding, the broad spectrum of genes and variants underlying EB makes it impossible to develop a single treatment strategy for all patients. It is likely that many different variant-specific treatment strategies will be needed to ultimately treat all patients. Antisense-oligonucleotide (ASO)-mediated exon skipping aims to counteract pathogenic sequence variants by restoring the open reading frame through the removal of the mutant exon from the pre-messenger RNA. This should lead to the restored production of the protein absent in the affected skin and, consequently, improvement of the phenotype. Several preclinical studies have demonstrated that exon skipping can restore protein production in vitro, in skin equivalents, and in skin grafts derived from EB-patient skin cells, indicating that ASO-mediated exon skipping could be a viable strategy as a topical or systemic treatment. The potential value of exon skipping for EB is supported by a study showing reduced phenotypic severity in patients who carry variants that result in natural exon skipping. In this article, we review the substantial progress made on exon skipping for EB in the past 15 years and highlight the opportunities and current challenges of this RNA-based therapy approach. In addition, we present a prioritization strategy for the development of exon skipping based on genomic information of all EB-involved genes

    SORTA:a system for ontology-based re-coding and technical annotation of biomedical phenotype data

    Get PDF
    There is an urgent need to standardize the semantics of biomedical data values, such as phenotypes, to enable comparative and integrative analyses. However, it is unlikely that all studies will use the same data collection protocols. As a result, retrospective standardization is often required, which involves matching of original (unstructured or locally coded) data to widely used coding or ontology systems such as SNOMED CT (clinical terms), ICD-10 (International Classification of Disease) and HPO (Human Phenotype Ontology). This data curation process is usually a time-consuming process performed by a human expert. To help mechanize this process, we have developed SORTA, a computer-aided system for rapidly encoding free text or locally coded values to a formal coding system or ontology. SORTA matches original data values (uploaded in semicolon delimited format) to a target coding system (uploaded in Excel spreadsheet, OWL ontology web language or OBO open biomedical ontologies format). It then semi-automatically shortlists candidate codes for each data value using Lucene and n-gram based matching algorithms, and can also learn from matches chosen by human experts. We evaluated SORTA's applicability in two use cases. For the LifeLines biobank, we used SORTA to recode 90 000 free text values (including 5211 unique values) about physical exercise to MET (Metabolic Equivalent of Task) codes. For the CINEAS clinical symptom coding system, we used SORTA to map to HPO, enriching HPO when necessary (315 terms matched so far). Out of the shortlists at rank 1, we found a precision/recall of 0.97/0.98 in LifeLines and of 0.58/0.45 in CINEAS. More importantly, users found the tool both a major time saver and a quality improvement because SORTA reduced the chances of human mistakes. Thus, SORTA can dramatically ease data (re) coding tasks and we believe it will prove useful for many more projects
    • …
    corecore